Learning Goals

Step 1: Identify a data set and visualization tasks (due 4/19)

  1. Identify a data set of interest for your visualization project.
    The data set of interest is the annual air quality data collected at outdoor monitors across the US.
    Source: https://aqs.epa.gov/aqsweb/airdata/download_files.html

  2. Describe what kind of information can be derived through exploratory visualization analysis of the data set.
    To explore the trend of air pollutants across years and compare the trend of individual pollutant.

  3. Identify the target audience for the visualization tool that you will build.
    The target audience would be anyone interested in the air quility and specific pollutants in the US. They could be researchers with expert knowledge or anyone without previous experience in the field.

  4. Develop a list of visualization tasks for the data set.
    A. Visualize the mean quantity of the pollutant (user choose) in a year at outdoor monitors across the US on a US map.
    B. Visualize the scatter plot of the mean quantity of individual pollutants vs years.
    C. Visualize the heatmap of the mean quantity of all pollutants across years.
    D. Visualize the distribution of all pollutants in a year or in multiple years.

  5. Describe the data types present in your data set (temporal, networks, multivariate matrices, etc.).
    Multivariate data including categorical, numerical, geographical, binary and date.

Step 2: Apply Five Design Sheet Methodology (4/26)

  1. Apply Five Design Sheet Methodology.

  1. Describe potential visualization challenges.
    Some visaulization challenges include the implementation of the shiny app, data preprocessing, and the layout of the whole visualization.
    Note that your design may go beyond what you will actually implement in your Shiny app.

Step 3: Describe Implementation Strategy (due 4/26)

Write a short paragraph describing how you are planning to implement your application and how different components of your visualization will be interacting with each other.
The shiny app will have two parts: one sidebar panel with controls and one main panel with three plots. The sidebar panel will have two sets of controls. The first set of controls is for the map, which includes one select input that selects years, one radia button that selects region, and one select input thtat selects pollutants. The second set of controls is for the distribution plot, which includes one multiple select input that selects and deselects region, and another multiple selection input that selects and deselects years. The three plots in the main panel are a US map, a distribution, and a trend line plot. When mouse hovering on the map, it will show the distribution of the pollutant in the region of the year. When clicking on the map, the distribution locks in and there will be a trend plot added for the pollutant in the region. Users can change the region, year, and pollution by side bar panel. They can click the map again, and a new distribution will add to the distribution if the same pollutant, and a new trend line will be added below the previous trend plot.

Preprocessed data looks like below, data from 1999 to 2004 will be use

# only read in first 200 rows for testing purpose
testdat <- read.csv("airdata.csv",stringsAsFactors = F,nrows = 200)
str(testdat)
## 'data.frame':    200 obs. of  13 variables:
##  $ State.Code      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ County.Code     : int  27 27 27 27 27 27 27 27 27 27 ...
##  $ Site.Num        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Parameter.Code  : int  68101 68102 68103 68104 68105 68106 68107 68108 68109 88101 ...
##  $ Parameter.Name  : chr  "Sample Flow Rate- CV" "Sample Volume" "Ambient Min Temperature" "Ambient Max Temperature" ...
##  $ Year            : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
##  $ Units.of.Measure: chr  "Percent" "Cubic meter" "Degrees Centigrade" "Degrees Centigrade" ...
##  $ Arithmetic.Mean : num  0.245 23.985 9.478 24.907 16.528 ...
##  $ State.Name      : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ County.Name     : chr  "Clay" "Clay" "Clay" "Clay" ...
##  $ Latitude        : num  33.3 33.3 33.3 33.3 33.3 ...
##  $ Longitude       : num  -85.8 -85.8 -85.8 -85.8 -85.8 ...
##  $ FIPS            : chr  "01-27" "01-27" "01-27" "01-27" ...

Reference: http://www.grroups.com/blog/r-plotting-heat-map-choropleth-on-us-county-level-map-using-ggplot2

Time series prediction analysis